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EFFICIENT RARE-EVENT SIMULATION FOR THE MAXIMUM 
OF HEAVY-TAILED RANDOM WALKS 1 

By Jose Blanchet and Peter Glynn 

Harvard University and Stanford University 

Let (X n : n > 0) be a sequence of i.i.d. r.v.'s with negative mean. 
Set So = and define S n = Xi + ■ ■ ■ + X n . We propose an importance 
sampling algorithm to estimate the tail of M = maxISVi : n > 0} that 
is strongly efficient for both light and heavy-tailed increment distri- 
butions. Moreover, in the case of heavy-tailed increments and under 
additional technical assumptions, our estimator can be shown to have 
asymptotically vanishing relative variance in the sense that its coef- 
ficient of variation vanishes as the tail parameter increases. A key 
feature of our algorithm is that it is state-dependent. In the presence 
of light tails, our procedure leads to Siegmund's (1979) algorithm. 
The rigorous analysis of efficiency requires new Lyapunov-type in- 
equalities that can be useful in the study of more general importance 
sampling algorithms. 

1. Introduction. In this paper we consider the problem of efficient sim- 
ulation of first-passage time probabilities for heavy-tailed random walks 
(r.w.'s). More precisely, suppose that (S n : n > 0) is the r.w. generated by the 
sequence of independent and identically distributed (i.i.d.) random variables 
(r.v.'s) X = (X n : n > 1) (i.e., S n = S n -i + X n with So = 0). We assume that 
EX n < 0. Define M = max{S n : n > 0} and r(6) = inf{n > : S n > b}. We 
are interested in developing efficient simulation methodology to compute 

(1) P(T{b)<oo) = P{M>b), 

when b is large (i.e., the event {M > b} is rare) and X± is heavy-tailed. 
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We say that an unbiased simulation estimator R(b) for P(M > b) is 
strongly efficient if 

sup ER{bf/P(M > bf < oo. 

b>0 

Strong efficiency implies that the number of simulation runs required to esti- 
mate P(M > b) to a given relative accuracy is bounded in b. A weaker crite- 
rion is logarithmic efficiency, which implies that the number of replications 
required to estimate P(M > b) with a given relative accuracy grows at rate 
o(|logP(M > see Asmussen and Glynn (2007), Juneja and Shahabud- 
din (2006) or Bucklew (2004), Section 5.2, for a discussion of efficiency in 
rare-event simulation. A strongly efficient estimator is said to exhibit asymp- 
totically vanishing relative error when ER(b) 2 ~ P(M > b) 2 as b f oo (or, 
equivalently, when the coefficient of variation vanishes as b S oo). 

In this paper we develop an implementable state-dependent importance 
sampling algorithm that can be rigorously proved to possess asymptotically 
vanishing relative error. By "state-dependent," we mean that the importance 
sampling algorithm generates the next increment of the random walk from a 
distribution that depends on the walk's current state (i.e., location). This is 
the first strongly efficient algorithm that has been developed for estimating 
the tail of M in the presence of general heavy-tailed increment distributions. 
Prior efficient algorithms require the increment distribution to be of M/G/l 
type with regularly varying or Weibull type right tails. 

A key idea is that our importance distribution is state-dependent. There 
is a long history of applications of state-dependent importance sampling to 
simulation problems. Perhaps the first related contributions are those by 
Hammersley and Morton (1954) and Rosenbluth and Rosenbluth (1955) in 
the context of molecular simulation; see also the text by Liu (2001) for ap- 
plications of sequential importance sampling in various scientific contexts. 
However, a general framework for rigorous analysis of these types of al- 
gorithms is still under development. In a sequence of recent papers, Paul 
Dupuis and Hui Wang [see, e.g., Dupuis and Wang (2004)] have proposed a 
general methodology that can be applied in the presence of large deviations 
theory for light-tailed systems. Our paper contributes to this general liter- 
ature by developing Lyapunov-type inequalities (see Theorem 2) that are 
particularly useful for the analysis of state-dependent algorithms. 

The general theory of importance sampling establishes that the theoret- 
ically optimal importance distribution (having zero variance) involves sam- 
pling from the conditional distribution of the random walk given {r(b) < oo}. 
Under this conditional distribution, the random walk has increment dis- 
tributions that are state-dependent. However, we cannot implement this 
zero variance sampling scheme because the state-dependent increment dis- 
tribution requires explicit knowledge of the function u*(-) = P(r(-) < oo). 
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Our approach involves using asymptotic approximations for u*(-) to obtain 
an implementable state-dependent change-of-measure that closely approxi- 
mates the true conditional distribution. In the current G/G/l setting, the 
asymptotic approximation for tt*(-) is 



as b y oo. An important step in our approach is to use (2) in order to 
construct a function v(-) such that 



as b y oo. Note that if v = u* , the above difference vanishes. The above 
convergence rate [namely, that associated with (3)] is a convenience in de- 
veloping our simulation algorithm, but is not necessary (see Proposition 
3 and Theorem 3). We show that a v(-) satisfying (3) can be constructed 
using (2) whenever X belongs to the class S* of heavy-tailed distributions — 
which is slightly smaller than the class of subexponential distributions but 
includes regularly varying, Weibull, lognormal and many more distributions 
as special cases; see Assumption A in Section 3 for a precise definition. 

The problem that we address here is motivated by applications in queueing 
and insurance. The distribution of M is of great interest in queueing theory 
as it coincides with the steady-state waiting time distribution of the single- 
server G/G/l queue. In addition, the first passage time probability displayed 
in (1) is of central interest in the context of insurance risk. In particular, 
such a first passage time probability can be interpreted as the probability 
that an insurer receiving premiums at a constant rate is eventually ruined 
when subject to a renewal arrival process of i.i.d. claims. When the claim 
distribution is heavy-tailed, the resulting calculation is exactly of the type 
discussed in this paper. Statistical evidence suggests that such heavy-tailed 
distributions frequently arise in practice and are a convenient vehicle for 
capturing many of the key stylized features that are present in observed 
claim sizes [see, e.g., Embrechts, Kluppelberg and Mikosch (1997) and Adler, 
Feldman and Taqqu (1998)]. 

The first efficient rare event simulation algorithm for the tail of M was 
suggested by Siegmund (1976), who was motivated by the first passage 
time interpretation displayed in (1) and its connection to one-sided sequen- 
tial probability ratio tests in the context of statistical sequential analysis. 
Siegmund's algorithm applies only to light-tailed r.w.'s and involves an im- 
portance distribution corresponding to a r.w. with state-independent incre- 
ments. Our proposed strongly efficient algorithm is consistent with recent 
results of Bassamboo, Juneja and Zeevi (2006), who show that no state inde- 
pendent efficient importance sampling algorithm for computing (1) can exist 
in the (regularly varying) heavy-tailed setting. Another key feature that is 





(3) 
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present in the light-tailed context is the ability to fully leverage the existing 
theory of large deviations. A complicating factor in the heavy-tailed setting 
is that the large deviations literature is not applicable to such problems. 
Asmussen, Binswanger and Hojgaard (2000) provide a number of examples 
and counterexamples to illustrate the additional difficulties that arise in the 
heavy-tailed environment. 

As noted above, rare-event simulation algorithms for heavy-tailed distri- 
butions have been previously developed in the context of the M/G/l queue. 
The first logarithmically efficient simulation algorithm for estimation of (1) 
was given in Asmussen and Binswanger (1997) and was based on the idea 
of conditional Monte Carlo (and not importance sampling). Logarithmic ef- 
ficiency for their algorithm was established for regularly varying tails and 
was shown to fail for Weibull-type heavy tails. Subsequently, Asmussen, 
Binswanger and Hojgaard (2000) developed simulation estimators for the 
M/G/l queue based on importance sampling ideas that are provably loga- 
rithmically efficient for both regularly varying and Weibull-type tails. Juneja 
and Shahabuddin (2002) also developed logarithmically efficient importance 
sampling schemes based on a suitable twisting of the M/G/l service time 
distribution's hazard rate. More recently, Asmussen and Kroese (2006) pro- 
posed other logarithmically efficient importance sampling algorithms for the 
M/G/l queue that seem to have excellent performance in practice. In ad- 
dition, they developed a conditional Monte Carlo estimator that is strongly 
efficient for both regularly varying tails and certain Weibull type heavy- 
tails. Dupuis, Leder and Wang (2006) proposed a state-dependent impor- 
tance sampling algorithm that is strongly efficient for a regularly varying 
M/G/l queue. All the above algorithms take advantage of the fact that 
the ladder height distribution for the M/G/l queue is explicitly known. In 
contrast, no such explicit computations are possible for the class of G/G/l 
models considered here. This significantly complicates both the development 
and the theoretical analysis of efficient rare-event algorithms for this class 
of problems. Indeed, we developed our Lyapunov bounds largely in order to 
provide a suitable verification tool for bounding the variances (as required to 
establish strong efficiency) for the algorithm considered here. More recently, 
Blanchet, Glynn and Liu (2007) have used this Lyapunov technique to study 
an alternative importance sampling algorithm for the G/G/l queue that is 
based on mixture sampling (rather than on a Markovian importance sampler 
having a transition kernel based explicitly on the approximation v as is the 
case here). This alternative algorithm, while typically simpler to implement 
than the approach described here (because generating increments from a 
mixture distribution is usually easier than the variate generation schemes 
required here), applies only to regularly varying distributions (rather than 
the class S* covered by this paper's algorithm). 
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The paper is organized as follows. Section 2 introduces a general technique 
to study efficient state-dependent importance sampling algorithms for com- 
puting first passage time probabilities of general state-space Markov chains 
and recovers Siegmund's algorithm as a direct application of the basic ideas 
underlying our procedure. Section 3 introduces the precise technical assump- 
tions under which we develop our methodology and provides both the proof 
of strong efficiency for our importance sampling estimator and establishes 
its asymptotically vanishing relative error property. In Section 4 we discuss 
computational complexity issues associated with our algorithm, leading us 
to a study of the number of variate generations required to terminate our 
procedure. Additional practical observations and some results on simulation 
experiments are given in our final section. 

2. Efficient importance samplers for exit probabilities. The problem of 
computing the level crossing probability (1) can be viewed as a special 
case of computing an exit probability. To be specific, let Y = (Y n : n > 0) 
be a <Y-valued Markov chain (with stationary transition probabilities) and 
let Py(-) and E y (-) be the probability distribution and expectation oper- 
ator on the path-space of Y, conditional on Yq = y. For B C X, let T = 
inf{n > 0:Y n , € B} be the exit time from B c . For A C B, the probabil- 
ity u*(y) = P y (Yr £ A,T < oo) is called an "exit probability" (all the sets 
considered here are assumed measurable) . Note that the level crossing prob- 
ability (1) is the special case in which Y is given by the r.w. (S n :n > 0), 
X = [—00,00), B = {—00} U (6,00), A = (6,00) and y = 0. Because of the 
translation invariance of r.w., studying this problem as b S 00 is equivalent 
to fixing B = {—00} U [0, 00), A = (0, 00), setting y = —b and letting 6/00. 
With B and A fixed in this way, our goal is to efficiently compute u*(—b) as 
b y 00. This reformulation of the problem will form the basis of our analysis 
in the remainder of the paper. 

The following result is easily proved [see, e.g., Meyn and Tweedie (1993)]. 

Proposition 1. The function u* = (u*(y) : y G B c ) is the minimal non- 
negative solution to 



subject to the boundary conditions that u(z) = 1 for z € A and u(z) = for 
z<EBC)A c . 

As mentioned in the Introduction, the zero- variance importance distribu- 
tion for computing u*(y) is that associated with the conditional distribution 
P y (-\Yr € A,T < 00). Let T n = ct(Yq, . . . , Y n ) for n > 0. Our next result char- 
acterizes this conditional distribution. 
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Theorem 1. Suppose that u*(y) > for y € B c . Then, for each nonneg- 
ative Tt -measurable r.v. A, 

E y [A\Y T G A,T < oo] = E*A, 

where E*(-) is the expectation operator under which Y is a Markov chain 
having one-step transition kernel 

P*(y,dz) = P y (Y l edz)^ y 

for y £B C , zEX. 

Proof. Note that I(T = n)A = X n (Y , . . . ,Y n ) for some (measurable) 
function A n : X n+l — > [0, oo). Therefore, 

E y [A;T = n,Y T e A,T < oo] 
u*(y) 

X n (y,zi,.. . ,z n )u*(z n )P(y,dzi) ■ ■ ■ P(z n -i,dz n ) 
B^x---xB c xA U*(y) 

P{y,dzi)u*{zi) P{z 1 ,dz 2 )u*(z 2 ) 



\ n (y,zi,...,z n )- 



B c x-xB c xA ' ' u*(y) u*{z\) 

< P{z n -2,dz n -i)u*{z n -i) P(z n -l,^n)^*(^n) 

u*(z n - 2 ) u*(z n --]) 

= E* y [A-T = n\. 
Summing over n, we conclude that 

E[A\Y T G A, T < oo] = E* [A; T < oo]. 
Letting A = 1 establishes that Py(T < oo) = 1, proving the result. □ 

This theorem makes clear that the zero-variance importance sampling 
distribution for computing (1) corresponds to a random walk in which the 
increments have a state-dependent distribution. The above result suggests 
that a good importance sampling distribution can be obtained by simulating 
Y under transition dynamics that closely approximate those induced by the 
zero- variance importance distribution's transition kernel P*. 

Suppose that Q is the Markov transition kernel chosen by the simulator 
to compute the exit probability u*(y) = P y {Yr G A, T < oo) via importance 
sampling. Assume that (Q(y,dz) : y, z € B c U A) can be represented as 

Q(y, dz) = r{y, zy 1 P y {Y l G dz)I(y G B c , zGB c U A) 

+ 5 y (dz)I(yeA,zeA) 
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for some positive function r(-). Note that 
P y (Y T £A,T = n) 



E Q 

v 



I(Y T e A,T = n) Y[r(Yj-i,Yj) 
3=1 



where E®(-) is the expectation operator under which Y evolves according to 
the transition kernel Q, conditional on Yq = y. Summing over n, we conclude 
that u* can be represented as 



f{y) = E$ 



I(T<^)\{r(Y ] _ 1 ,Y J ) 

3=1 



An important step in any theoretical analysis of the estimator 

T 



(4) 



R = I(T<oc)l[r(Y^i,Y j ) 

3=1 



is to bound its variance. The variance, conditional on Yq = y, is given by 
s*(y) — u*(y) 2 , where s*(y) = E®R 2 . Since only s*(-) depends on the choice 
of the importance distribution, we focus on bounding this quantity. 



Theorem 2. 

(i) The function s* = (s*(y) :y £ B c ) is the minimal nonnegative solu- 
tion to 



s(y)=v(y)+ K(y,dz)s(z), 
Jb c 



for y 6 B c , where 



V(V) = / r(y,z)P y (Y 1 edz), 
J A 

K{y,dz)=r(y,z)P y (Y 1 £dz), 



for y,z £ B c . 

(ii) The function s* is given by 



oo 



s* = Y,K n V , 

n=0 



where K n (y,dz) = J BC K n - 1 (y,dy 1 )K(y 1 ,dz) for n>\, K°(y,dz) = S y {dz) 
and (K n r ] )(y)=J Bc K n (y,dz)r 1 (z). 
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(iii) Suppose that h = (h(y) : y G B c ) is a finite-valued nonnegative func- 
tion for which 

(5) (Kh)(y)<h(y)-ri(y) 
for y G B c . Then, s*(y) < h(y) for y £ B c . 

Proof. Part (ii) follows by expanding E^[R?I{T = n)] and summing 
over n using Fubini's theorem. Part (i) follows easily from (ii). 

For part (iii), first note that Kh must be finite- valued by virtue of (5). 
Induction based on applying K n to both sides of (5) establishes that K n h is 
finite-valued for n > 1. By applying K n h to (5) and using the fact that K n h 
is finite- valued for n > 1, we conclude that K n rj < K n h — K n+1 h for n > 0. 
Summing over < n < m and using the nonnegativity of h, we obtain the 
bound 



m 

if 

n=0 



K n r] <h- K m+L h < h. 

i 

The result follows by sending n f oo and using part (iii). □ 



We call the function h(-) a Lyapunov function and refer to bounds based 
on part (iii) of Theorem 2 as Lyapunov bounds on the second moment. 

Returning to the exit probability computations, suppose that v = (v(y) : y € 
X ) is chosen by the simulator to be a good approximation to u* = (u* (y)-y& 
X). In view of Theorem 2 above, it is then natural to consider simulating Y 
via the transition kernel 

(6) Q {yidz)=P{yyd2) ^ 

w(y) 

(for y £ B c , z £ B c U A), where w(y) is the normalization constant given by 

w (y) = / P(y,dz)v(z) 
Jb c ua 

(assumed to be finite). In this case, r(y, z) = w(y)/v(z). The following result 
provides a Lyapunov bound on the second moment s*(-) that is specifically 
suited to this setting. 

Proposition 2. Assume that w(y) > for y € B c and suppose that 
there exists a finite-valued function h : B c U A — ► [e, oo) satisfying 



(7) w(y) J v(z)h(z)P(y, dz) < h(y)v(y) 2 , 

for y G B c . If h(z) > 1 for z G A and v(z) > K > for z G A, then s*(y) < 
e~ l K~ 2 v{y) 2 h{y). 
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Proof. Put h(-) = k 2 h(-)v 2 (-) and note that (7) is equivalent to as- 
suming that 

(8) (Kh)(y) <h(y) - K - 2 w(y) f P(y,dz)w(y)h(z) 

J A 

for y eB c . But 

V (y)= [ P(y,dz)^l< [ K - 2 P(y,dz)w(y)v(z) 
Ja v{z) Ja 

<K- 2 w{y)e- 1 f P(y,dz)v(z)h(z), 
J A 

so that (8) implies that 

(Kh)(y) <h(y) - V (y) 
for y G B c . We now apply part (hi) of Theorem 2 to complete the proof. □ 

Suppose that v(-) has been chosen by the simulator to be within a constant 
multiple of «*(■), as occurs whenever v(-) has the same asymptotic behavior 
as u*(-). In this case, it follows that the importance sampling algorithm 
based on r(y, z) = w{y)/v{z) has bounded relative variance [i.e., the ratio of 
the variance to the square of u*(x)] across B c whenever the function h of 
Proposition 2 can be chosen to be bounded. On the other hand, if h grows 
at a suitable rate [e.g., h(y) = \ log(v(y))\ 1 ^ 2 ], the logarithmic efficiency of 
the importance sampler can be assured. 

To illustrate, consider the problem of estimating 

u*(-b) = P(r(0) < oo|5 = -b) 

for b > in the light-tailed setting. In particular, suppose that there exists a 
positive root 9* of E[exp(9*X 1 )] = 1 for which E[X 1 exp(d*X 1 )] < oo. If X 1 
is nonlattice, then it is known that 

u*(-b) ~cexp(-0*6) 

for some positive constant c; see, for example, Asmussen (2003), page 365. 
The natural choice for v is, of course, v(z) = exp(6*z), in which case w(z) = 
exp(6*z). If we put h(y) = 1 for y 6 R, Proposition 2 applies, yielding the 
bound 

s*(-&)<exp(-20*&). 

Hence, this importance sampling algorithm [which is precisely the one pro- 
posed by Siegmund (1976)] is strongly efficient. 
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3. Elements of our algorithm for heavy-tailed r.w.'s. We shall explore 
how to adapt the ideas discussed in the previous sections to the case of a ran- 
dom walk with heavy-tailed increment distributions. We need the following 
definitions. Set X + = max(X, 0) and X~ = max(— X, 0). 

Definition 1. A nonnegative r.v. Z is said to be subexponential if 

P(Z 1 + Z 2 >t)~2P(Z>t), 

as t y co where Z\ and Z 2 are independent copies of Z . A r.v. X is said to 
be subexponential if X + is subexponential. 

Definition 2. A nonnegative r.v. Z belongs to the family S* if 

2EZP(Z > t) ~ [ P(Z > t - s)P{Z > s) ds 
Jo 

as t y co. In addition, a r.v. X is in 5* if X + is in S* . 

Definition 3. A r.v. X is said to possess a long tail if for every constant 

P{X>t + a)~P{X>t) 

as t y co. 

It can be shown that if Z is in S*, then it must be subexponential. Also, 
any subexponential r.v. possesses a long tail. The class S* of random vari- 
ables includes, as particular cases, regularly varying, lognormal and Weibull- 
type distributions among many others. For more on the specific properties 
of various types of heavy-tailed distributions, see Embrechts, Kliippelberg 
and Mikosch (1997), Section 1.4. 

The following assumption will be imposed throughout the rest of the 
paper. 

Assumption A. Assume that A"+ belongs to S*, that is, 

2EX+P(X n > t) ~ f P(X n > t - s)P(X n > s) ds 
Jo 

as t y co. 

If X belongs to S* , then both the distribution of X and its integrated tail 

f°°P(X>s) , 

L ~^x^ ds 
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are sub exponential [see Asmussen (2003), Section 10.9]. Under Assump- 
tion A, it is known [see, e.g., Asmussen (2003), page 296] that 

—1 f°° 

(9) u*(-b) = P(r(0) < oo\S = -b) ~ — / P(X > t) dt 

as b y oo. The previous result is also known in the literature as the Pakes- 
Veraberbeke theorem. 

The natural strategy is to use this approximation to construct an ap- 
propriate importance sampling transition kernel Q(x,dy) [defined in (6)] by 
means of a function v(-) that mimics the behavior of u*(-). An important es- 
timate in the efficiency analysis of our importance sampling scheme involves 
the behavior of v(y) — w(y) as y \ — oo, where w{y) = Ev(y + X). As we 
indicated earlier, if one selects v = u* , then the difference v (y) — w(y) van- 
ishes. Thus, it is natural to expect that the asymptotic behavior of this 
difference will play an important role in the performance of the impor- 
tance sampling estimator. As we shall see, in order to guarantee strong 
efficiency of the importance sampling estimator, it suffices to select v (•) so 
that v(y) — w(y) = o(P(X > —y)) as y \ — oo. 

Recent estimates by Borovkov and Borovkov (2001) under regularly vary- 
ing or semiexponential assumptions provide asymptotics to u*(y) that hold 
with an error of order o(P(X > —y)) as y \ — oo. Under these assump- 
tions, Borovkov and Borovkov (2001) add an additional term to (9) of or- 
der 0(P(X > —y)) to the approximation (9) which yields an error rate 
o(P{X > -y)) as y \ -oo. 

Given the form of (9), it may be surprising at first sight that making 
use only of approximation (9) and assuming only that the distribution of X 
belongs to the class S* one can easily construct v(-) that actually achieves 
an error of order o(P(X > —y)) for the difference v(y) — w(y) as y \ — oo. 
In fact, as we shall prove in our next proposition, v(—t) can be defined as 
the tail probability of a nonnegative random variable Z such that 



(10) P(Z>t)= min 



(EXy L / P{X > s)ds,l 



for t > [this may imply P(Z = 0) > 0]. Then, we write v (y) = P{Z > —y) 
for all y € M. Note that if we could pick u* = v, this would correspond to 
choosing Z = M. Given our representation for v(-) as a tail probability, we 
can write 

w(y) = E[v(y + X)} =P(X + Z> -y). 

The next result shows that this choice of v(-) has the indicated convergence 
rate for the difference v{y) — w{y). However, for the purpose of our efficiency 
analysis, it is the second part of the following result, namely, inequality (11), 
which we shall invoke. 



12 J. BLANCHET AND P. GLYNN 

PROPOSITION 3. Under Assumption A, 

w(y) - v{y) = o(P(X > -y)) 

as V \ —oo. Consequently, for each 7 G (0, 1), there exists a*(7) G (— 00, 0] 
such that, for all y < a* (7), 

(11) v{yf-w{yf 
K ' 7 " P{X>-y)w( y y 

PROOF. We must show that 

P{X + Z>t)- P(Z >t) = o(P(X > t)) 
as t y 00. Note that 

P(X + Z>t) = P(X + Z>t,Z>t) + P(X + Z>t,Z<t) 
= P(Z >t)- P(X + Z<t,Z>t) 
+ P(X + Z>t,Z<t). 
First, we will show that, as t f 00, 

P(X + Z >t,Z <t) ~ P(X > t)EX~/(-EX). 
Let y = mf{t G K : P(Z > t) < 1}. Then, 
P(X + Z>t,Z<t) 

= f P{X>t- s)P(X >s)ds + P(X>t- y )P(Z = y ). 

tjA Jy Q 

We now analyze the integral on the right-hand side of the previous display: 

f P{X>t-s)P{X>s)ds 
Jyo 

rt-yo 

= P{X>t-y -s)P{X>s + y )ds 

Jo 

rt—yo 

= / P(X > t - y - s)P(X > s) ds 
Jo 

rt-yo 

+ / P(X>t-y -s)[P(X>s + y )-P(X>s)]ds. 
Jo 

Let us define by I\ and I2 the two last integrals on the right-hand side of 
the display above. Then, assumption A yields 

rt-yo 

I x = / P(X > t - y - s)P(X > s) ds 
Jo 

~ 2P(X > t)EX + as t / 00. 
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Now, for the integral I2, we have 

ft— yo rs+yo 
I 2 = / p{X >t-y -s)d P(X > u) du 

JO Js 

= - / / P(X > u) duP{X <E ds) 
JO Jt-yo-s 

ft ryo 
+ P(X>0) / P(X>u)du-P(X>t-y ) / P(X>u)du. 

Jt-vn JO 



Note that 

rt-a 



P(X >t- s)y < f P{X>u)du 
fi+yo/t 

= t J P(X >ut-s — y ) du 



< P(X > t - s - y )yo- 
Hence, by virtue of Assumption A, we have that, as t f 00, 

ft-y ft-s 

/ / P(X>u)duP(Xeds)~P(X>t)y P(X>0). 

Jo Jt-yo-s 

Similarly, we obtain that 

P{X>0) f P{X>u)du~P(X>t)y P(X>0) 

Jt—Vn 



lt-yo 

as t y 00, which yields 

rvo 

I 2 ~-P(X>t) / P(X>s)ds. 
Jo 

Combining these estimates, we obtain 

P(X + Z>t,Z <t) 

- (h + h)/(-EX) + P(X>t- y )P(Z = y ) 

rvo 

~ 2P(X > t)EX + /(-EX) - P(X > t) / P(X > s) ds/(-EX) 

Jo 

+ P(X>t-y )P(Z = y ). 

Since 

P(Z = y ) = l- £° P(X > s) ds, 
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we have 



P(X + Z >t;Z <t) 

~ P(X > t)[2EX + + (—EX) - EX 
= P(X >t)EX~ /(-EX). 



}/(-EX) 



On the other hand 



P(X + Z <t,Z>t) 




as t y oo. This yields the proof of the result. □ 

The constant a* that characterizes the region where inequality (11) holds 
will play an important role in the construction of our algorithm. The bound 
(11) indicates that on the region (— oo,a*] the approximation to the zero- 
variance change-of-measure based on v(-) is good enough to control the 
variance of the likelihood ratio in our simulations. Finding a* can be done 
numerically or analytically depending on the problem at hand. For imple- 
mentation, the simulator can choose any value of 7 (for instance, 7 = 1/2) 
or optimize the asymptotic upper bound that we shall obtain in Theorem 
3, which we now are ready to state and prove. 

Consider the importance sampling change-of-measure generated by 



has bounded relative variance as 5o = y \ — 00. 

Theorem 3. Suppose that Assumption A is in force. Fix 7 € (0, 1) and 
select a* =0^(7) G (— 00, 0] as in (11). Then, 




t(0) 

R = I( T (0) < 00) I 
i=i 




E®«* R 2 <(1- 1 )~ l K(a, f y 2 v(y + a,) 2 , 
where «(a*) = mi z >Q[v(z + a*)] = P(Z > —a*). Consequently, 
sup E®1* [R(bf]/P(M > bf < 00. 
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Proof. Define 

h(y) = I(y + a* < 0) + (1 - j)I(y + a* > 0). 

We wish to apply Proposition 2 so we must satisfy bound (7), which in our 
case can be written as 

' v(y + a*) x 



(14) wiy + a^EviX + y + a*)h(X + y) < 



w(y + a*] 



for all y < 0. Here we have used the fact that h(y) = 1 for y < 0. Using the 
interpretation of v(-) as a tail probability, we note that the bound (14) can 
be expressed, for all y < 0, as 

E(h(X + y)-l\X + Z>-y-a,)< < y + a f = W % + ^ . 

w{y + a*y 

Observe that 

h(X + y) - 1 = - 7 J(X > -y - a*)- 
Therefore, it suffices to verify that, for all y < 0, 

-7-P(X > -y - a^X + Z > -y - a*) 
u(y + a*) 2 --w^ + a*) 2 



< 



w(y + a*y 



However, it follows since Z > and using the fact that w{y) = P(X + Z > 
—y), that the previous inequality holds if and only if, for all y < 0, 

i^y + a^-u^y + a*) 2 
P{X > -y - a*)w(y + a*) 

which is true by definition of a* . The conclusion of the result follows directly 
from Propositions 2 and 3, the fact that P(M > b) ~ v{— b + a*) as 6 / oo 
and that the ratio P(M > b)/v(—b + a*) is bounded as a function of b on 
compact sets. □ 



Our approach to the study of the issue of asymptotically vanishing relative 
error will involve taking advantage of extreme value theory; see, for instance, 
Embrechts, Kliippelberg and Mikosch (1997), Section 3.3. We say that X± 
belongs to the domain of attraction of H [denoted by X\ G MDA{H)\ if H 
is nondegenerate and there exists a sequence of constants c n > and d n G R 
(for n > 1) such that 

c^ 1 (max(A"i,...,X n ) - d n ) H 

as n f oo. The random variable H must follow a so-called extreme value dis- 
tribution which, due to the Fisher-Tippett theorem [see Embrechts, Kliippelberg 
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and Mikosch (1997), page 121], can be of only three types. Only the cases 
when H has Frechet distribution, given by 

$ a (x) = exp(-x~ a )I(x > 0), a>0, 

or when H follows a Gumbel distribution described via 

A(x) = exp(— exp(— x)) 

are of interest to us. The class MDA(& a ) is precisely the class of regularly 
varying distributions with index a > [i.e., P{X > x) = x~ a L(x), where 
L(-) is slowly varying at infinity], whereas MDA(A) contains other com- 
monly used heavy-tailed distributions, such as log-normal and Weibull. The 
normalization constants in the definition of H (i.e., the c n 's and (i n 's) depend 
on the so-called auxiliary function, which is defined via 

_ j^p(x 1 >t)dt 
p(x l>x ) • 

The following result of Asmussen and Kluppelberg (1996) provides some 
asymptotic properties of the zero- variance change-of- measure as b f oo. 
These properties will be useful in verifying that our estimator possesses 
asymptotically vanishing relative variance as b f oo. 

Theorem 4 [Asmussen and Kluppelberg (1996)]. Assume either that 
X\ is regularly varying with index a > 1 or that assumption A holds and 
Xi G MDA(A). Then, given S = -b<0 and r(0) < oo, 

( S t{Q)-i r(0) S t(0) \ , vvnjPY , T , 

as b y oo, where V and T are a pair of random variables with joint distri- 
bution P(V > x,T > y) = P(H > x + y) . 

With the previous result in hand, we now can sharpen the conclusion of 
Theorem 3 to obtain asymptotically vanishing relative error. The next result 
provides theoretical justification for the empirical performance discovered in 
the numerical experiments shown in Section 5, in which the accuracy for a 
given number of runs is seen to improve when b gets larger. 

Theorem 5. Assume either that X\ is regularly varying with index a > 
1 or that assumption A holds and X\ € MDA(A). Then, one can choose 
7(6) \ and —a* (b) f 00 as b y 00 such that the estimator provided by 
Algorithm 1 [defined as R in (13)] satisfies 

E%R 2 _ 
pjM > b) 2 ~ L 
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Proof. Under the stated assumptions, £(&) /* oo as b /* oo [see As- 
mussen and Kluppelberg (1996)]. Furthermore, because both X\ and Z are 
long tailed, it follows that, for any constant c> 0, £(b + c) ~ £(&) as b /* oo. 
We start by noting that 



/ r(0) 

E%R 2 = E_ h J(r(0)<oo)n 

= P(M > &)«;(-& + a*) 
/r(0)-l 



u(5fe + a*) 



x £_ 



n 



1 



J=1 w(S fe + a*) u(S T ( ) + a*) 
The Cauchy-Schwarz inequality implies that 

/r(0)-l 



r(0) <oo . 



n 



w(5 fe + a*) 



1 



(15) 



x 



u(5 fc + a*) v(5 r ( ) + a*) 



rr(0)-l 

n 

v i=i 



w(S k + a*) 2 



r(0) < oo 

\ 1/2 



u(5 fe + a*) 2 
1 



r(0) < oo 



r(0) < oo 



1/2 



Consider the first term in (15), which involves the ratios w(Sf- + a*) 2 /v(Sk + 
a*) 2 . We can again use a Lyapunov argument as the one introduced in the 
proof of Theorem 3. In fact, a completely analogous argument as the one 
given there shows that, for each 7 £ (0, 1), there exists a value of a* < for 
which 

/r(0)-l 



E. 



n 



w(S k + a*) 



r(0) <oo < 



1 



1-7 



In fact, one just chooses a* < for which the inequality 

v(y + a*) 2 - w(y + a*) 2 u*(y)P(X > -y - a*) 



< 



(16) 



-P(-X" > -y - a*)w(y + a*) + a*)P(X > — y) 
u(y + a*) 2 - w(y + a*) 2 u*(y)£(-y - a*)v(y + a*) 



P(X > -y- a*)w(y + a*) v{y)£(-y)w(y + a*) 

holds uniformly in y < 0. In view of Proposition 3, it suffices to analyze the 
ratio £(— y)/£(— y — a*). But because £(&) /* 00 as b / 00 and + c) ~ £(&) 
as 6 /* 00, it follows that 

supsup£(-y)/£(-?/ - a) < 00, 

y<0 a<0 
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which is more than what is needed to guarantee that inequality (16) holds 
for a* < sufficiently negative. 

With a* selected as above, observe that 



(17) 



1 



r(0) < o^j 



v(S T(0) +a*) 2 

(q -> i rn\^ , P-b(S T{0) <-a,|r(0)<oo) 
< P_b(S T{0 ) > -a*|r(0) < oo) + P(z>-a*) 2 



By virtue of Theorem 4, we have that the right-hand side of (17) converges 
to 1 as b y oo. We may therefore conclude that, given 7 G (0, 1), there exists 
a selection of a* > for which 

ET h R 2 1 



lim^oo— — ^— --t < 



'P(M>b) 2 -I-7 

Since 7 > is arbitrary, we obtain the result by sending 7 \ and (possibly) 
also sending a* (7) \ —00 at a sufficiently slow rate. □ 

Remark. Although the previous result is intended only to provide a 
theoretical justification for the numerical performance found in our experi- 
ments, one can, in principle, find a computable constant a* < 00 for which 

E%R 2 . 1 



lim^oo— — — — ^ < 



'P(M>b) 2 ~ I-7' 

This is clear from display (16) in the proof of Theorem 5. Note that every- 
thing in the left-hand side of (16) can, in principle, be evaluated, except of 
course u*(y). However, it suffices to find a computable bound for u*(y)/v(y), 
which can be obtained in many different ways, one of them through the use 
of the Lyapunov inequalities. Indeed, note that Theorem 3 and the fact that 
E® b R 2 > u*(—b) 2 (which follows by the Cauchy-Schwarz inequality) could 
be used to obtain a computable upper bound for sup 2/<0 u*(y)/v(y). 

4. The algorithm and complexity analysis. Recall that Z is the nonneg- 
ative r.v. for which 



P(Z >t)= min 



POO 

-(EX)' 1 J P(X>s)ds,l 



The function v(-) is then defined through the relation v(t) = P(Z > —t) 
and w(-) is given by w(y) = P(X + Z > —y). We assume that v and w 
are either available in closed form or can be easily computed numerically. 
Note that in the conventional light-tailed setting, the sampler suggested 
through large deviations approximations to v (and hence, to w) can often be 
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implemented via "exponentially twisting" the increment distribution. Actual 
implementation of an importance sampler based on exponential twisting 
requires that the moment generating function be computable either in closed 
form or through a cheap numerical calculation, and that the correct twisting 
parameter (usually as characterized through a root of the gradient of the log 
moment generating function) be easily computable. Our assumptions on v 
and w can be viewed as the heavy-tailed analog to this requirement in the 
light-tailed case. 

For fixed 7 6 (0, 1), set a* = a* (7) < satisfying (11). We wish to estimate 
u*(-b)=P(T(0)<oo\S = -b), 
for b > 0. Our proposed algorithm proceeds as follows. 

Algorithm 1. 

STEP 1. Initialize s = —b, R = l. 

STEP 2. Set y < — s, generate a random variable Y with law 
P(Y €t + dt) = P(X £ t + dt\X + Z>-y-a*), 
and update s < — y + Y , 

R< — w(y + a^)v(s + a*) -1 /? 

= P(Z + X>-y- a,)P{Z >-s- a*) -1 !?. 
STEP 3. If s > then return R and STOP, otherwise, go to STEP 2. 

Theorem 3 implies that the above algorithm is strongly efficient, in the 
sense that the number of simulation runs required to estimate P(M > b) 
to a given relative accuracy is bounded in b. Within the rare-event simu- 
lation community, this statistical notion (and its close relative logarithmic 
efficiency) is the commonly accepted standard that a good algorithm should 
achieve. 

However, a more demanding notion is to study the computational com- 
plexity of the algorithm. Roughly speaking, the goal is to show that the 
number of floating point operations required to compute P(M > b) to a 
given relative accuracy increases at a slower rate than that associated with 
the use of crude Monte Carlo. [Note that it is typically unrealistic to ex- 
pect that the number of floating point operations can be uniformly bounded 
over b, because the number of r.v.'s required to generate the random ob- 
ject I(M > b) is increasing in b in expectation]. In our current setting, the 
required number of floating point operations is determined by the number 
of simulation runs required to estimate P(M > b) to a given relative accu- 
racy multiplied by the expected number of floating point operations per run. 
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Since Theorem 3 has already established the boundedness of the number of 
simulation runs, the key issue then becomes estimating the expected number 
of floating point operations per run (as a function of b). 

Note that a complete analysis of this issue is impossible without hav- 
ing a model for floating point arithmetic that attaches different costs to 
such computations as special function evaluations (e.g., numerically eval- 
uating the exponential function), generating uniform random variates and 
performing comparison operations. In addition, such a complexity analysis 
requires explicit specification of the numerical effort involved in evaluating v 
and w, both in the case in which closed forms are available and (even more 
critically) in the setting in which a numerical integration scheme is used to 
compute w. These issues arise even in the setting of light-tailed rare-event 
simulation, in which the algorithm typically depends on exponential twist- 
ing. In particular, the issue of numerical computation of the log moment 
generating function, its gradient and the associated roots would be a neces- 
sary element in a complete complexity analysis of a light-tailed rare-event 
simulation algorithm. 

While a complete complexity analysis is no doubt a worthwhile undertak- 
ing, we present here a simplified analysis of what we believe reflect the key 
pragmatic complexity issues. We take the point of view that the expected 
number of floating point operations per run is roughly proportional to the 
total number of uniform random variables generated per run. The expected 
number of uniform random numbers required per run is obtained as the 
product of the number of steps needed by the importance sampler (having 
transition kernel Q a *) to cross level from —b, multiplied by the typical 
number of uniform random variables required to generate each increment of 
the random walk as governed by the kernel Q a *- We shall argue, later in 
this section, that the expected number of steps needed by our importance 
sampler (having transition kernel Q a ,) to cross level from initial posi- 
tion —b grows linearly in b; see Proposition 4. The question of how many 
uniform random variates are required, on average, per increment of Q at is 
very specific to the precise form of the distribution of X and to the in- 
genuity employed in developing a variate generation scheme for simulating 
from Q at 's increment distribution. To illustrate this point, we provide an 
acceptance-rejection algorithm later in this section that uses a bounded (in 
expectation) number of uniform random variates per increment simulated 
(that is bounded both in b and the position x of the sampler) whenever X has 
a regularly varying continuous density (assuming that there exists a variate 
generation scheme that generates X using a finite — in expectation — number 
of uniforms). It follows that the total number of uniform random variates 
required per simulation run grows at a linear rate in b. On the other hand, 
for crude Monte Carlo, the number of simulations runs required to compute 
P(M > b) to a given relative accuracy scales in proportion to 1/P(M > b). 
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Because the paths on which M <b take an infinite number of steps to gen- 
erate, the expected number of floating point operations required per run is 
infinite. [If the simulation is terminated after t steps with t chosen so as 
to produce an estimator bias of a small (and given) magnitude relative to 
P(M > 6), both t and the number of steps increase linearly in b.] Thus, our 
importance sampler provides a substantial improvement in computational 
complexity relative to crude Monte Carlo. 

The following result (whose proof is given at the end of the section) pro- 
vides sufficient conditions to ensure that Algorithm 1 terminates in at most 
0(b) steps, given So = —b. 

Proposition 4. Assume that E(X\]Xi > 0) < oo for some p>l and 
that Assumption A is in place. Then 

E Qa b *r(0) = O(b) 

as b /" oo. 

In order to complete our complexity analysis, it is important to observe 
that the execution of STEP 2 of the algorithm involves a one dimensional 
rare-event type simulation problem. We have assumed that v(-) and w(-) can 
be easily evaluated. Nevertheless, it could be the case that finding explicitly 
the distribution of Y in STEP 2 could be difficult or numerically expensive. 
We shall argue that the variates in STEP 2 can be simulated through a 
suitable acceptance/rejection scheme. Note, however, that one has to design 
the scheme in such a way that the acceptance probability remains uniformly 
bounded (in y) away from zero. By doing this, the generation of the random 
walk increments in STEP 2 under the importance sampling distribution has 
uniformly bounded complexity as b f oo. Consequently, given Proposition 
4, the expected number of variates required to run Algorithm 1 will be of 
order 0(b) as b /"oo. 

Typically, acceptance/rejection schemes, such as those indicated in the 
previous paragraph, although not complicated, must be designed based on 
specific characteristics of the problem at hand. Assume that X has a con- 
tinuous density fx{')- STEP 2 of Algorithm 1 requires sampling a r.v. Y 
with density /y(-) defined, for b > 0, as 

f Y {z;b) = v{-b + z)f x (z)/w(-b). 

The objective is to find an easy way to simulate r.v. Z with computable 
density fg(z; b) such that, for all z € R, 

(18) f Y {z-b)< Pacc {by l fz(z-M, 

where the acceptance probability, p aC c{b), satisfies inffe> p a cc(b) > 0. 
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In order to illustrate the construction of let us assume that fx( 

is regularly varying. We pick 9 G (0, 1) and define 

^ „, v ^, n^ p ( z > eb ) P(X>b-9b) 
c(b) = P(X<b- Ob)^^ + l p (Z>6) ' 

A (6) = c(6)~ 1 P(X < b - 9b)P(Z > 9b)/P(Z > b), 
AiO) = c(b)- l P(X >b- 9b)/P(Z > b). 
Then, the mixture density 

fx(z)I(z<b-9b) 



h(z;b)=XoQ>y 



P(X <b-6b) 



satisfies 



where 



+Ai < W>»W (z>6 - g6) 



f Y (z;b)<mc(b)f~(z;b) 



m > sup[P(Z > b)/P(Z + X > 6)]. 

fe>0 



The acceptance probability using /^(z; b) as proposal is [mc(6)]~ 1 . Using ele- 
mentary properties of regularly varying functions, it follows that inffo>o[c(6) x 
m] _1 > 0. 

We conclude the section with a proof of Proposition 4. 

Lemma 1. Suppose that Assumption A is in force and that E(X^;X\ > 
0) < oo for some p > 1. T/ien i/tere exists to > and e > suc/i i/iai, for all 
t>t , 

E[X\X + Z>t]>e. 

Proof. The assumptions imply that X\ and Z must be subexponential. 
In particular, it follows that P(X + Z > t) ~ P(Z > t) as t /* oo. Thus, it 
suffices to show that 

V 7 P(Z>t) P{Z>t) ) 

The Bounded Convergence Theorem implies that 

- El x-,x + z> t ]_ r /(z>t-») P{X€ds) EX - 



P(Z>t) J-oo P(Z>t) 
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as t y oo. On the other hand, we have that 
E[X + ;X + Z>t] 

= E[X;X + Z >t;X > 0] 

/•OO 

= £7/ /(X + Z>f;X>0;X>s)ds 
Jo 

POO 

= / P(X + Z>t;X>s)ds 
JO 

rt—vo 

= / P(X + Z>t;X>s)ds + P(Z>t-y )(-^), 
Jo 

where y = inf{i G E : P(Z > t) < 1}. Now, 
/•*— wo 

/ P(X + Z>t;X>s)ds 
Jo 

/•t-yo 

= / P(X + Z>t;s<X <t-y )ds 
Jo 

+ {t-y )P(X>t-y ). 

The first integral in the right-hand side of the previous display is greater or 
equal to 

rt-yo 

/ P(Z > t - s)P(s <X <t-y )ds~ P(Z > t)EX + . 
Jo 

On the other hand, it follows that 

fit 

tP(X>t)> J P(X>s)ds 

= [P{Z > t) - P(Z > 2t)](-EX). 

Observe that if E(Xf;X\ > 0) < oo for some p > 0, then there exists 5 > 
such that the map t — > t s P(Z > t) is eventually decreasing. Therefore, 

(20) Irni t ^ 00 P(Z > 2t)/P(Z >t)< (1/2) 5 < 1. 

Putting all the previous estimates together (and using the fact that Z has 
a long tail), we obtain that the limit in (19) is greater or equal to 

-EX~ -EX + EX + - (1 - (l/2) s )EX 

= -(l-(l/2) 5 )EX>0, 

which is more than we need in order conclude the proof of the lemma. □ 

Finally, we provide the proof of Proposition 4. 



24 



J. BLANCHET AND P. GLYNN 



PROOF of Proposition 4. It follows from Lemma 1 and Chebyshev's 
inequality that there exists a < and e > such that 

(21) sup E(X\X + Z>-y -a) >e. 

y<a 

Now, set r(a) = inf{n > 1 : S n > a}. It follows from (21) then that on {r(a) > 
n}, there exists e > such that 

E^*(S n+1 \S n )-S n >e 

and, therefore [letting min(n,r(0)) =n Ar(0)], it is not hard to see that 
M n = S nAT / \ — (n Ar(a))e is a submartingale (under Q a „). 111 particular, we 
obtain that 

eE$ a * [n A r(a)] < E^* S nAT(a) -y<-y. 
Finally, the monotone convergence theorem yields 

E^r(a) <\y\/e. 
On the other hand, we have that 

sup E® a *T(a) < 1 sup E[X + y;X + y < a\X + Z > -y - a*] < oo. 

3/£[a,0] e a<y<0 

Therefore, it follows from a geometric trials argument that 
E Qa b *r(0) 



<E^r(a) + 



sup P(X > — a\X + Z > —y — a* 

a<y<0 



-i 



SUP ^ y 

ye[a,0] 



E® a *T(c 



<b ■ m 

for some m G (0, oo), which yields the proof of the result. □ 

5. Empirical validation. We first consider a class of models for which 
other provably efficient algorithms have been developed. This permits a di- 
rect computational comparison of our algorithm against other existing meth- 
ods. In particular, we shall consider here an M/G/l queue with traffic inten- 
sity p=l/2 and with Pareto service times having tail P(V > t) = (1 + t)~ 2 ' 5 
(we use V to denote a generic service time). As noted in the Introduction, 
two existing competing algorithms for this class of models are the ones pro- 
posed by Asmussen and Kroese (2006) (AK) and Dupuis, Leder and Wang 
(2006) (DLW). AK's procedure was designed to deal with regularly vary- 
ing and Weibull-type service times, whereas DLWs works only for regularly 
varying distributions. There is only one other algorithm that can be shown 
to be strongly efficient for a single-server G/G/l queue with arbitrary re- 
newal arrivals, due to Blanchet, Glynn and Liu (2007) (BGL). It should be 
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noted that it requires regularly varying service times, and hence, does not 
cover the range of G/G/l models covered by this paper's algorithm. The 
BGL procedure does not exploit the representation of the steady-state dis- 
tribution of the G/G/l queue in terms of the maximum of a random walk, 
but instead takes advantage of the regenerative ratio representation for the 
steady-state distribution. 

We have also added a computational comparison against a more refined 
implementation of the algorithm introduced in this paper, which takes ad- 
vantage of the exponential inter-arrival times to help speed up the time it 
takes for the random walk to hit the target set. This refined implementation 
can be found in Blanchet and Li (2006) (BL), and relies on the fact that, for 
an M/G/l queue, the distribution of the ladder heights can be computed 
explicitly. We therefore can apply the algorithm of the current paper to a 
first-passage time problem involving a strictly increasing random walk that 
is killed at a geometric time, thereby basically running the algorithm at the 
level of ladder epochs (and saving the computer time associated with gen- 
erating the transitions between ladder epochs that occurs in the algorithm 
described in this paper). 

Table 1, based on 10,000 samples, compares the performance of our algo- 
rithm (which is denoted in the table below by BG) against the AK, DLW 
and BGL procedures. In our algorithm, we chose a* = 10 and carefully im- 
plemented a numerical integration routine in order to compute w(-); v(-) can 
be evaluated in closed form in terms of an incomplete gamma function. The 
sampling of each of the increments was done using an acceptance rejection 
procedure similar to the one explained in Section 4 right after display (18). 

Perhaps not surprisingly, given that a closely related variant of our esti- 
mator (in which a* can increase with b) exhibits asymptotically vanishing 
relative error (as indicated by Theorem 5), one can see that the coefficient 
of variation displayed for BG and BL diminishes as the level increases. As 
indicated above, the advantage of the BL implementation over the algo- 
rithm discussed here (BG) is that BG requires 0(b) variate generations per 
replication of the estimator, whereas the requirement is 0(1) as b f oo for 
BL — of course, such speed up relies heavily on the assumption of Poisson 
arrivals. It should be noted that the AK and DLW algorithms also require 
O(l) variates per replication for their estimators, and enjoy relatively sim- 
ple implementations, particularly for the case of the AK estimator. Finally, 
we note that the BGL procedure also takes 0(b) variate generations per 
replication of the estimator. 

We conclude this section with a problem instance for which BG is the 
only currently available procedure for efficient tail estimation. In particular, 
consider a G/G/l queue with deterministic inter-arrival times equal to 1 
and a service time tail distribution given by 

P(V >i) =exp(-2t 1/2 ). 
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Table 1 

Numerical estimates for u(So) with Pareto tails and p = 1/2 



[Estimation] 
[Std. error] 



[Conf. nitGrval] 




50 — 


i n 2 

- J. u 


JO — 


i n 3 
- ±u 


<?„ 

JO — 


i n 4 


v(S ) 


9.663e 


-03 


3.151e 


-05 


9.996e 


-07 


AK 


5.995e 


-04 


3.145e 


-05 


9.980e 


-07 




7.395e 


-06 


2.186e 


-07 


6.945e 


-09 




[5.850e 


-04, 


[3.102e 


-05, 


[9.844e 


-07, 




6.140e 


-04] 


3.188e 


-05] 


1.012e 


-06] 


BG 


5.485e 


-04 


3.146e 


-05 


9.980e 


-07 




2.984e 


-06 


9.725e 


-08 


2.073e 


-09 




[5.427e 


-04, 


[3.126e 


-05, 


[9.939e 


-07, 




5.543e 


-04] 


3.165e 


-05] 


1.002e 


-06] 


BL 


9.750e 


-04 


3.162e 


-05 


9.980e 


-07 




4.363e 


-06 


1.982e 


-07 


4.511e 


-09 




[9.664e 


-04, 


[3.123e 


-05, 


[9.892e 


-07, 




9.836e 


-04] 


3.201e 


-05] 


1.007e 


-06] 


BGL 


1.022e 


-03 


3.167e 


-05 


1.128e 


-06 




3.835e 


-05 


1.598e 


-06 


7.280e 


-08 




[9.468e 


-04, 


[2.854e 


-05, 


[9.853e 


-07, 




1.097e 


-03] 


3.480e 


-05] 


1.271e 


-06] 


DLW 


1.046e 


-03 


3.163e 


-05 


9.905e 


-07 




5.195e 


-06 


1.694e 


-07 


2.993e 


-09 




[1.036e 


-03, 


[3.130e 


-05, 


[9.846e 


-07, 




1.056e 


-03] 


3.196e 


-05] 


9.964e 


-07] 



The increment X n has a distribution given by X = V — 1 so that 

poo 

EX= / exp(-2t 1/2 )dt- 1 
J o 

= 1/2 - 1 = -1/2, 

which implies that the traffic intensity is p = 1/2. 

To run the algorithm, we selected a* = —10. We also tried a* = —5 and 
—55. For a* = —5, the sample coefficient of variation was slightly lower than 
the one that we display below, but not too much. For a* = —55, we obtained 
sample coefficients of variation no larger than 100. In both cases, the cor- 
responding pointwise estimates were very consistent with those displayed 
below. 

The most interesting part of the implementation involves Step 2, namely, 
sampling from the r.v. Y with law 

P(Y et + dt) = P(X G t + dt\X +Z>/3). 

For this step we use an acceptance/rejection procedure. Again, we make sure 
that the acceptance probability remains uniformly bounded away from zero 
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over as (3 /* oo. Let m = [P 1 ^ 2 ] and note that 
P(X£t + dt\X + Z>(3) 

P(Z>/3) 



k=0 



+ fxit)!^ 1 ' 2 < t < P) P{Z> o 0) + fx(t)I(t > (3) 



Using the dominating density induced by the expression in the right-hand 
side, we have that Step 2 can be performed in 0(/3 1//2 ) operations [hence, 
a single sample path generated by the proposed algorithm takes at most 
operations] . 

Table 2 illustrates the performance of the algorithm. BG is the estimator 
based on our importance sampling scheme using 20,000 replications. In order 
to validate the implementation of the algorithm, we constructed, for Sq = 
—10, a crude Monte Carlo estimator based on 500,000 replications. The 
estimator was obtained using the regenerative ratio formula [see Asmussen 
(2003), page 268, equation (1.6)]. An approximate 95% confidence interval 
for u(-10) based on these 500,000 samples is [1.862e - 02,2.179e - 02] (the 
point estimate was 2.020e — 02). It is worth noting that the width of our 
importance sampling confidence interval is about 1/2 of that produced by 
crude Monte Carlo, with 25 times fewer samples [for a probability that is 
just moderately small, as is the case of u(— 10)]. We did not apply crude 
Monte Carlo at the other values of So, because of the prohibitive amount of 
computation required. 

The column CV reports the estimated coefficient of variation of our es- 
timator, that is, the (estimated) standard deviation divided by the sample 
mean. 



Table 2 

Numerical estimates for u(So) with Weibull tails and p=l/2 



So v(S ) BG CV 95% Conf. interval 

-10 1.004e-02 1.942e-02 3.68 [1.857e - 02, 2.027e - 02] 

-50 9.577e- 06 1.783e-05 2.40 [1.724e - 05, 1.842e - 05] 

-250 5.666e- 13 7.076e - 13 2.39 [6.842e - 13, 7.310e - 13] 

-500 1.655e - 18 1.897e - 18 3.79 [1.797e - 18, 1.997e - 18] 

-650 3.584e-21 3.971e - 21 2.83 [3.815e - 21,4.127e - 21] 
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